Fast tandem mass spectra-based protein identification regardless of the number of spectra or potential modifications examined
نویسندگان
چکیده
MOTIVATION Comparing tandem mass spectra (MSMS) against a known dataset of protein sequences is a common method for identifying unknown proteins; however, the processing of MSMS by current software often limits certain applications, including comprehensive coverage of post-translational modifications, non-specific searches and real-time searches to allow result-dependent instrument control. This problem deserves attention as new mass spectrometers provide the ability for higher throughput and as known protein datasets rapidly grow in size. New software algorithms need to be devised in order to address the performance issues of conventional MSMS protein dataset-based protein identification. METHODS This paper describes a novel algorithm based on converting a collection of monoisotopic, centroided spectra to a new data structure, named 'peptide finite state machine' (PFSM), which may be used to rapidly search a known dataset of protein sequences, regardless of the number of spectra searched or the number of potential modifications examined. The algorithm is verified using a set of commercially available tryptic digest protein standards analyzed using an ABI 4700 MALDI TOFTOF mass spectrometer, and a free, open source PFSM implementation. It is illustrated that a PFSM can accurately search large collections of spectra against large datasets of protein sequences (e.g. NCBI nr) using a regular desktop PC; however, this paper only details the method for identifying peptide and subsequently protein candidates from a dataset of known protein sequences. The concept of using a PFSM as a peptide pre-screening technique for MSMS-based search engines is validated by using PFSM with Mascot and XTandem. AVAILABILITY Complete source code, documentation and examples for the reference PFSM implementation are freely available at the Proteome Commons, http://www.proteomecommons.org and source code may be used both commercially and non-commercially as long as the original authors are credited for their work.
منابع مشابه
DeltAMT: a statistical algorithm for fast detection of protein modifications from LC-MS/MS data.
Identification of proteins and their modifications via liquid chromatography-tandem mass spectrometry is an important task for the field of proteomics. However, because of the complexity of tandem mass spectra, the majority of the spectra cannot be identified. The presence of unanticipated protein modifications is among the major reasons for the low spectral identification rate. The conventiona...
متن کاملOn Comparison of SimTandem with State-of-the-Art Peptide Identification Tools, Efficiency of Precursor Mass Filter and Dealing with Variable Modifications
The similarity search in theoretical mass spectra generated from protein sequence databases is a widely accepted approach for identification of peptides from query mass spectra produced by shotgun proteomics. Growing protein sequence databases and noisy query spectra demand database indexing techniques and better similarity measures for the comparison of theoretical spectra against query spectr...
متن کاملDeltAMT: A Statistical Algorithm for Fast Detection of Protein Modifications From LC-MS/MS Data*□S
Identification of proteins and their modifications via liquid chromatography-tandem mass spectrometry is an important task for the field of proteomics. However, because of the complexity of tandem mass spectra, the majority of the spectra cannot be identified. The presence of unanticipated protein modifications is among the major reasons for the low spectral identification rate. The conventiona...
متن کاملSpectral networks: a new approach to de novo discovery of protein sequences and posttranslational modifications.
Significant technological advances have accelerated high-throughput proteomics to the automated generation of millions of tandem mass spectra on a daily basis. In such a setup, the desire for greater sequence coverage combines with standard experimental procedures to commonly yield multiple tandem mass spectra from overlapping peptides-typical observations include peptides differing by one or t...
متن کاملPreprocessing of tandem mass spectra using machine learning methods
Protein identification has been more helpful than before in the diagnosis and treatment of many diseases, such as cancer, heart disease and HIV. Tandem mass spectrometry is a powerful tool for protein identification. In a typical experiment, proteins are broken into small amino acid oligomers called peptides. By determining the amino acid sequence of several peptides of a protein, its whole ami...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Bioinformatics
دوره 21 10 شماره
صفحات -
تاریخ انتشار 2005